8 research outputs found
An Algebraic Characterization of Total Input Strictly Local Functions
This paper provides an algebraic characteriza- tion of the total input strictly local functions. Simultaneous, noniterative rules of the form A→B/C D, common in phonology, are defin- able as functions in this class whenever CAD represents a finite set of strings. The algebraic characterization highlights a fundamental con- nection between input strictly local functions and the simple class of definite string languages, as well as connections to string functions stud- ied in the computer science literature, the def- inite functions and local functions. No effec- tive decision procedure for the input strictly local maps was previously available, but one arises directly from this characterization. This work also shows that, unlike the full class, a restricted subclass is closed under composition. Additionally, some products are defined which may yield new factorization methods
Recommended from our members
A Logical and Computational Methodology for Exploring Systems of Phonotactic Constraints
We introduce a methodology built around a logical analysis component based on a hierarchy of classes of Subregular constraints characterized by the kinds of features of a string a mechanism must be sensitive to in order to determine whether it satisfies the constraint, and a computational component built around a publicly-available interactive workbench that implements, based on the equivalence between logical formulae and finite-state automata, a theorem prover for these logics (even algorithmically extracting certain classes of constraints), wherein the alternation between these logical and computational analyses can provide useful insight more easily than using either in isolation
Recommended from our members
Tier-Based Strictly Local Stringsets: Perspectives from Model and Automata Theory
Defined by Heinz et al. (2011) the Tier-Based Strictly Local (TSL) class of stringsets has not previously been characterized by an abstract property that allows one to prove a stringset\u27s membership or lack thereof. We provide here two such characterizations: a generalization of suffix substitution closure and an algorithm based on deterministic finite-state automata (DFAs). We use the former to prove closure properties of the class. Additionally, we extend the approximation and constraint-extraction algorithms of Rogers and Lambert (2019a) to account for TSL constraints, allowing for free conversion between TSL logical formulae and DFAs
Relativized Adjacency
For each class in the piecewise-local subregular hierarchy, a relativized (tier-based) variant is defined. Algebraic as well as automata-, language-, and model-theoretic characterizations are provided for each of these relativized classes, except in cases where this is provably impossible. These various characterizations are necessarily intertwined due to the well-studied logic-automaton connection and the relationship between finite-state automata and (syntactic) semigroups. Closure properties of each class are demonstrated by using automata-theoretic methods to provide constructive proofs for the closures that do hold and giving language-theoretic counterexamples for those that do not. The net result of all of this is that, rather than merely existing as an operationally-defined parallel set of classes, these relativized variants integrate cleanly with the other members of the piecewise-local subregular hierarchy from every perspective. Relativization may even prove useful in the characterization of star-free, as every star-free stringset is the preprojection of another (also star-free) stringset whose syntactic semigroup is not a monoid
Extracting Subregular constraints from Regular stringsets
We introduce algorithms that, given a finite-state automaton (FSA), compute a minimal set of forbidden local factors that define a Strictly Local (SL) tight approximation of the stringset recognised by the FSA and the set of forbidden piecewise factors that define a Strictly Piecewise (SP) tight approximation of that stringset, as well as a set of co-SL factors that, together with the SL and SP factors, provide a set of purely conjunctive literal constraints defining a minimal superset of the stringset recognised by the automaton. Using these, we have built computational tools that have allowed us to reproduce, by nearly purely computational means, the work of Rogers and his co-workers (Rogers et al. 2012) in which, using a mix of computational and analytical techniques, they completely characterised, with respect to the Local and Piecewise Subregular hierarchies, the constraints on the distribution of stress in human languages that are documented in the StressTyp2 database. Our focus, in this paper, is on the algorithms and the method of their application. The phonology of stress patterns is a particularly good domain of application since, as we show here, they generally fall at the very lowest levels of complexity. We discuss these phonological results here, but do not consider their consequences in depth
Typology emerges from simplicity in representations and learning
We derive well-understood and well-studied subregular classes of formal languages purely from the computational perspective of algorithmic learning problems. We parameterise the learning problem along dimensions of representation and inference strategy. Of special interest are those classes of languages whose learning algorithms are necessarily not prohibitively expensive in space and time, since learners are often exposed to adverse conditions and sparse data. Learned natural language patterns are expected to be most like the patterns in these classes, an expectation supported by previous typological and linguistic research in phonology. A second result is that the learning algorithms presented here are completely agnostic to choice of linguistic representation. In the case of the subregular classes, the results fall out from traditional model-theoretic treatments of words and strings. The same learning algorithms, however, can be applied to model-theoretic treatments of other linguistic representations such as syntactic trees or autosegmental graphs, which opens a useful direction for future research
Robust Identification in the Limit from Incomplete Positive Data
Intuitively, a learning algorithm is robust if it can succeed despite adverse conditions. We examine conditions under which learning algorithms for classes of formal languages are able to succeed when the data presentations are systematically incomplete; that is, when certain kinds of examples are systematically absent. One motivation comes from linguistics, where the phonotactic pattern of a language may be understood as the intersection of formal languages, each of which formalizes a distinct linguistic generalization. We examine under what conditions these generalizations can be learned when the only data available to a learner belongs to their intersection. In particular, we provide three formal definitions of robustness in the identification in the limit from positive data paradigm, and several theorems which describe the kinds of classes of formal languages which are, and are not, robustly learnable in the relevant sense. We relate these results to classes relevant to natural language phonology
TAYSIR Competition: Transformer+rnn: Algorithms to Yield Simple and Interpretable Representations
This article presents the content of the competition Transformers+rnn: Algorithms to Yield Simple and Interpretable Representations (TAYSIR, the Arabic word for 'simple'), which was an on-line challenge on extracting simpler models from already trained neural networks held in Spring 2023. These neural nets were trained on sequential categorial/symbolic data. Some of these data were artificial, some came from real world problems (such as Natural Language Processing, Bioinformatics, and Software Engineering). The trained models covered a large spectrum of architectures, from Simple Recurrent Neural Network (SRN) to Transformers, including Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM). No constraint was given on the surrogate models submitted by the participants: any model working on sequential data was accepted. Two tracks were proposed: neural networks trained on Binary Classification tasks, and on Language Modeling tasks. The evaluation of the surrogate models took into account both the simplicity of the extracted model and the quality of the approximation of the original model